





DOCUMENT RESUME 



ED 057 086 



TM 000 929 



AUTHOR 

TITLE 



INSTITUTION 
REPORT NO 
PUB DATE 
NOTE 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Thompson, Raymond E. . • _ . . . , 

Investigations of the Appropriateness of the college 
Board Science Achievement Tests for Students of 
Different High School science Courses. 

Educational Testing Service, Princeton, N.J. 

TDR-71-2 
Sep 71 

79p. * ■ • . "r. 

MF-J0.65 HC-$3. 29 

♦Achievement Tests; Biology; Chemistry; Course 
Evaluation; ♦Educational Background; ^ *High School 
Students; ♦performance Factors; Physics; Rating 
Scales; ^Science Courses; Science Tests; Statistical 
Analysis; Teacher Attitudes; Test Bias; Test 
Interpretation; Test Results; Test Reviews 
♦College Board science Achievement Tests 



test 



ABSTRACT 

Results of teacher ratings of test 
of students, and analyses of achievement 

jllege Board Science Achievement Tests 
for students in both regular and 
in biology, chemistry, and physics. (MS) 



indicate that the 



scores 
are now 

courses 








E0057086 






COLLEGE ENTRANCE EXAMINATION BOARD 
Research and Development Reports 
RDR- 71-72, No. 2 



INVESTIGATIONS OF THE APPROPRIATENESS OF THE COLLEGE BOARD SCIENCE 
ACHIEVEMENT TESTS FOR STUDENTS OF DIFFERENT HIGH SCHOOL SCIENCE COURSES 



Raymond E. Thompson, Science Department, Test Development Division, ETS 



Test Development Report 
TDR-71-2, September 1971 



Educational Testing Service 
Princeton, New Jersey 
Berkeley, Galifo: 



ERIC 




o 




TABLE OF CONTENTS 



Page 



Acknowledgments 1 

I. Summary * 

3 

II. Context 

3 

A. Background 

B. Approach . • . 

C. Implications for Achievement Testing and Course Development ... 4 

III. Purpose ^ 

IV. Design 6 

A. Identification of Examinees in Different Course Categories 6 

B. Checks on the Accuracy of Examinee Responses Regarding the 

Courses They Studied •-••••••••••••' 

C. Ratings of the Appropriateness of Test Questions for Students of 

Different Courses by Teachers of the Courses . . . .... ... • • • jy 

D. Actual Test Performance of Examinees in Different Courses .... 11 

E. Adjusted Test Performance of Examinees in Different Courses ... 11 

. 14 

V. Data and Discussion 

A. Numbers and Characteristics of Examinees in Different Course 

B. Accuracy of Examinee Responses Regarding Courses Studied . . . . ^ 

C. Ratings of the Appropriateness of Test Questions for Students of 

Different Courses by Teachers of the Courses . . . 

D. Actual Performance of Examinees in Different Courses on: 

1. Complete Science Achievement Tests . 

2. Subtests of the Achievement Tests Rated Appropriate for Differ- 
ent Courses 

3 . Scholastic Aptitude Test (SAT) 

E Adiusted Performance of Examinees in Different Courses on the 
'■ Science Achievement Tests after taking Account of Pacte mance 

on Subtests of Achievement Tests Rated Appropriate for Differ- ^ 

ent Courses and SAT . . 

, , , . . 62 

VI. Conclusions . -••••* * * ' * 

66 

List of Figures ....-••••••• 

, 67 

List of Tables 

A 69 

I. Science Achievement Tests Question Rating Form. • 69 

II. Directions for Rating Test Questions. . .. * • • • • • • • * ’ * * 

III. Total Numbers of Examinees Taking Each Science Achievement 

Test per School Year . . A • • • •>. • • • t T. * * * * ' * * * ’ * 79 

IV Numbers and Percentages Of Examinees in Different Courses. . . ./A 

v’ Mean Test Scores of Students in Different Course Categories. . . . /4 

A . . 75 . • V-'- 

. Bibliography,. : ;;/ •:< vt; v : • • 

3 *■ 

■ ' ■ ■ ■ . ■ . : \\Jf-- ' - - ' .. • :• ■ ■ - ■ - .■■■-,■ 

A ■ ..." ' •' .. ‘ r y ;.' '• A,J A'A.-A': • •••A. • v- : v • -.t •• 

. m m ■ t a .. a : . as a mm 



1 










Investigations of the Appropriateness of the College Board Science Achievement Tests 
for Students of Different High School Science Courses 



I. SUMMARY 

The appropriateness of the College Board Achievement Tests in Biology, Chemistry, 



and Physics for students of different high school science courses was investigated. These 
investigations involved six Biology, twelve Chemistry, and eight Physics Achievement 



Test Forms introduced in the 1960’s. The comparative fairness of the tests for various 

I 

pairs of courses was analyzed. These pairs were: BSCS 1 ~Blue and Regular Biology. 
BSCS-Green and Regular Biology, and BSCS-Yellow and Regular Biology; CBA 2 and 
Regular Chemistry and CHEMS 3 and Regular Chemistry; and PSSC 4 and Regular Physics. 

For most test forms three indications of their appropriateness for the courses in 
each relevant pair were secured. The first was ratings of test questions by teachers 
of the two courses. The second indication came from the actual performance of students 
of the two courses on the complete science achievement test, on questions in the test., 
rated appropriate for both courses by teachers, on the College Board Scholastic 
Aptitude Test Verbal Section, and on SAT - Mathematical Section. The third indication 
came from an analysis of the achievement test scores of students of the two courses 
after adjusting for their differing performances on the concomitant measures of the 
appropriate “for- both questions, SAT- V, and SAT- M. 

In interpreting the three indications of appropriateness, the following standards were 
taken to show bias: ( 1) mean teacher ratings differing significantly at the . 05 level; 

(2) actual means on the achievement te_sts differing in one direction with actual means 
on the concomitant measures differing in the opposite direction; and (3) adjusted achieve- 
ment test means differing by more than 15 scale score points after controlling for per 



formance on the concomitant measures through the analysis of covariance. 

1 Biological Sciences Curriculum Study 

2 Chemical Bond Approach Project ■ 

3 Chemical Education Material Study 

4 Physical Science Study Committee 
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In biology, teacher ratings indicated bias in favor of Regular Biology in the first two 
test forms studied. These indications of bias were not borne out by the test performance 
of students, however. In fact, in biology the lone indication of bias revealed by analysis 
of student performance showed that one of these first two test forms was biased in favor 
of BSCS- Yellow. On the last four test forms investigated, there was only one indication 
of bias; that came from teacher ratings and indicated that a test form introduced in 1966 
was biased in favor of BSCS- Yellow. 

i 

In chemistry, there were indications of bias in seven of the twelve test forms as 
follows: 



Form introduced in: Biased in favor of: 



Evidence from: 



1961_ 

1962 

1963 

1964 
1964 

1966 

1966 



regular over CHEMS _ ^c^^l performance 

_ regu laiTove r" CBA actual performance 

regular over CBA teacher ratings, actual and 

adjusted performance 

regular over CHEMS teacher ratings, actual and 

adjusted performance 

TegulaT over" CBA teacher ratings 

regular oyer CHEMS ^teacher^ ratings^ _ 

regular over ^CBA." teacher r atings and adjusted 

performance 

regular over CHEMS teacher ratings and adjusted 

performance 

Tegularover CHEMS_ __ ^^M^rfc^mance _ __ __ 

CHEMS over regular teacher ratings 



Teacher ratings on each course were not secured for forms prior to 1963: performance data 
are not available for the first 1964 form listed above. For the last five test forms studied, 
only the two indications of bias on 1966 forms noted above were revealed. 

In physics, there were indications from teacher ratings of bias in favor of Regular Physics 
on two test forms, one introduced in 1965 and one in 1966. Actual performance data supported 
the ratings for the 1965 form, but not die 1966 form. Adjusted performance data indicated 
no bias in any of the eight forms investigated. 

The evidence supports the general conclusion that die tests are now equally appropriate 

for students of regular and special-courses in biology, chemistry, and physics. 

WMM 
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II. CONTEXT 



The problem investigated was: Do students who are equivalent in scholastic aptitude and 
science ability but who have studied different high school science courses in a given subject 
earn equivalent scores on the College Board Science Achievement Test in that subject? 

A. Background 

Beginning in the late 1950*3 and continuing to the present several new high 
school science courses were developed by teams of scientists and science 
teachers. Since the new courses were thought by many to represent departures 
from existing courses in terms of content, approach, and Emphasis, there was 
and is interest in and concern about the appropriateness of the College Board 
Science Achievement Tests for students of these new courses. 

The first science course to attract substantial numbers of students who 
later took a College Board Science Achievement Test was the physics course 
developed by the Physical Science Study Committee (PSSG) . This course was 
soon followed by the two chemistry courses developed by the Chemical Bond 
Approach Project (CBA) and the Chemical Education Material Study (CHEMS); 
and the three biology courses known as the Blue, Green, and Yellow Versions 
developed by the Biological Sciences Curriculum Study (BSCS). 

Fomoff (1962) noted that the performances of the first PSSG students who 

took the Physics Achievement Test, in March 1958, showed that the test did 
not adequately measure their achievement. The Board s Committee on 
Examinations, therefore, authorized special physics tests for PSSC students. 

Such special tests were offered in March of 1959, 1960, 1961, 1963, and in 



December of 1961. Fomoff pointed out that the existence of two physics tests 
solved the problem of not providing a test closely matched to course objectives, 
but it introduced two others. The first problem introduced was that equal scores 
on the two tests did not necessarily represent equal achievement in physics; the 
second problem was that some students took. the wrong test and such mistakes 
cannot readily be detected and corrected. ; ; V V 
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B. Approach 

In order to avoid these problems, special tests were not developed for students 
of any of the courses that followed PSSC. Instead, the approach has been to develop 
single tests in each subject that would be appropriate for students of different 
courses. Principles that guide the committees of examiners in the development 
of such tests include the following: Most of the questions in a test deal with topics 
given major emphasis in most courses. The measurement of abilities that should 
be developed in all science courses is emphasized. Questions on topics more 
likely to be taught in one course are balanced by other questions on topics more 
likely to be treated in other courses. Questions on topics that may be unfamiliar 
to students of some courses are presented with background information so that a 
good science student should be able to answer them even though he has not. studied 
the topics in detail. In many instances, the committees of examiners have data 
on the difficulty and discriminating power of proposed questions for students of 
different courses secured through the pretesting of questions. 

C. Implications for Achievement Testing and Course Development 

Coffman (1971) notes that colleges may require applicants to submit scores 
on certain College Board Achievement Tests for one or more of the following 
purposes: ( 1) to aid in certifying that a candidate has or. has not achieved a level 

of competence in a subject; (2) to assist in placing students in a college course 
sequence; and (3) to make predictions of college performance in combination with 
other information. These purposes are probably best served by a single test in 
each of the three major high school science subjects for the following reasons. 
Most colleges are concerned about student competency in a science subject; 
e.g. , biology, rather than in a specific biology course; e.g. , BSCS (Yellow 
Version) ..... Most colleges do not have different sequences of courses for students 
of different high school courses in that subject. Finally, there is no reason to 
believe that special achievement tests designed for particular courses would 
make any greater contribution to the prediction of general college performance 
than single achievement tests designed f^ students of all courses in that subject. 
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One may ask. why students of a particular course should not earn higher scores 
on a test than students of another course. After all, is not the purpose of developing 
new courses the improvement of the science achievement of students? Tests could 
be developed that focused on the unique content of a particular course. No doubt 
students of that particular course would be at an advantage on such tests. But the 
College Board Science Achievement Tests cannot be so developed, if they are to 
serve their purposes. They must focus primarily on that broad range of content 
that is common to all widely used courses. One price that must be paid in order 
to realize those purposes is that the unique content of each avant-garde course 
cannot be immediately included in the tests. But this seems a small and legitimate 
price to pay to meet the purposes of these tests. 

For an outline of the issues discussed in this section see Angoff ( 1971) . 




III. PURPOSE 

The purpose of these investigations was to evaluate the appropriateness of the 
College Board Science Achievement Tests for students of different high school science courses. 
The purpose was not to find out if one course was better than another. Instead, the purpose was 
to determine if students of a given course were at a disadvantage on the relevant science achieve- 
ment test. ‘V T- • 

IV. DESIGN ' r • •• : . .T ' ' .. .V- ' 'T./'- 

Six Biology, twelve Chemistry, and eight Physics Achievement Test Forms were studied . 

The comparative appropriateness of each test form for relevant pairs of courses was investigated. 
The pairs of courses were: BSCS-Blue and Regular Biology, BSCS-Green and regular, 

BSCS- Yellow and regular; GBA and Regular Chemistry, CHEMS and regular; and PS SC and 
Regular Physics. -A ~ i 

For most test forms three measures 
were: (1) 

formance of students of both courses 
and (3) adjus 
concomitant variables 
highly and equally- appropriate 
Test- Verba! and Mathematical Sections. 

The following indication s of bias were, adopted: ( 1) mean teacher ratings differing signif- 

icantly at the . 05 level; ( 2 ) actual means on the achievement tests differing in one. direction 

. with actual means on the concomitant variables differing in the opposite direction; and 
(3) adjusted achievement test means differing by more thaii 15 scale score points after con- 
trolling for performance on the concomitant variables through the analysis of covariance; 

' Dyer* : Levine “ J - 1 -- — — ~ \ n ini h ai h<=» gi ari nf fh e se i h ve s tiera tiori s . ; 

Malcolm ( 1962) was 



isures of ^"appropriateness were obtained These measures 
ratings of achievement test questions by .teachers of both courses; ( 2 ) actual per 
>f students of both courses on the achievement test and: pn concomitant variables; 
ted achievement test scores resulting from' con trolling for performance on the 

• _ i_i ,v r ru« •? «*o « ♦* ir'i r-i o hi p c \uptp • gi lHGPts of the ach ie v e men t - te s t s . rated 




-7- 



Biology Test 

The question below pertains to whether or not you have taken some special courses in biology 
which are offered by some schools. You are to indicate your answer by blackening ONE and 
ONLY ONE space in the group of nine spaces labeled Q on your answer sheet. Read the 
question and the statements under it and blacken the space that corresponds to the statement 
that applies to you. Your answer to this question will be used for research purposes only 
and will not influence your score on the test. 

Question: Have you taken, 'or are you now taking^ one of the courses 

~ in biology developed by the Biological Sciences Curriculum 

Study (BSCS)? 

V NOTE: If you were in a BSCS course, you either used 

if paper-covered textbooks which had the symbol 

Ij shown at the right imprinted on the covers or 

f you used one of the hard-covered textbooks 

whose titles are given below. If you used the 
paper-covered books, the color of the covers 
indicated the version of the course you took. 
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Statements 



■ : ' ‘ *' . •. •• *' ' ■ ; '• , •• . ; *’ • 









Space 1. Yes, 1 took the Green Version of the BSCS course. 

(Title of hard-covered edition: High School Biology ) 

Space 2. Yes, 1 took the Yellow Version of the BSCS course. 

(Title of hard-covered edition: Biological Science: 

an Inquiry into Life ) \ '■ '7 

Space 3. Yes, 1 took tire Blue Version of the BSCS course. 

•— ( Title of hard— cove red edition: Biological Science: 

■ . : Molecules to Man ) J . 

Space 4. Yes, 1 took a BSCS course but 1 am not sure of the version. 
Space 5. ;; 1 am not sure whether or not 1 took a BSCS course. 

Spa:ce 6 .:|No . .1 have, npt taken a BSCS course. .‘..'C- 

Spaces 7-9. These spaces are to be left blank.,. 
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Chemistry Test 

In the group of nine spaces labeled Q, you are to blacken ONE and ONLY ONE space, as 
described below, to indicate how you obtained your knowledge of chemistry. The information 
that you provide will not influence your score on the test. 



Space 1 . I am now taking, or have taken , the chemistry 
course known as the Chemical Bond Approach 

Course (CBA). (If this applies to you, the 
symbol shown at the right will be familiar to 

you and you will have used either paper- covered 
textbooks with pages the same size as those in 
this test booklet or a hard-covered textbook 
titled Chemical Systems . ) 

Space 2. I am now taking, or have taken, the chemistry 
course known as the Chemical Education 
Material Study Course (CHEM Study) . (If this 
applies to you, the symbol shown at the right 
will be familiar to you and you will have used 
either paper- covered textbooks with pages the 
same size as those in this test booklet or a , ; . 
hard- covered textbook titled Chemistry: An 
Experimental Science . ) 

Space 3. I am not sure if l am taking, or have taken, 
either the CBA or the CHEM Study Course. 

e not taken, either 
the CBA or the CHEM Study Course. 

These spaces are to be; left blank. 

■ :y v 1 ^ i ‘ . .‘‘7; i 

•> y ; Ph y s ic s Te s t 



Covalent 



Ionic Metallic 




Space 4. I am not taking, or 



Spaces 5-9. 



S3 






To provide information on youir -training in physics , please 
select the one statement that applies to you. Therein the g 



;;iy : . . ;■ •; .. y • V- : y. ■■ •; yy. yy 

d the following statements and 

select the one statement that applies to you Then , . in the group of nine spaces labeled Q on 
your answer sheet, blacken the one space whose number cprresponds to that of the statement 
you selected. The information that you are asked to supply is for research purposes only and 
will have no effect on your test score. 



Space 1: 

I am now taking, or have taken, the physics 
course prepared by the Physical Science Study 
Committee (PSSC) . (If you took this 
course, you used a textbook with a strobo- 
scopic photograph of a bouncing ball on 
its cover. ) : 

Space 2: 

I am not sure if I am taking, or have 
taken, the PSSC course. 



Space 3: ‘ / .'-y-;'' 

I am not taking, or have not taken, 
the PSSC course. 

Space s 4-9: 

These spaces are to be left blank. 
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The following chart indicates which courses were included in these studies. 



me 



Subject 

Biology 



Chemistry 



Physics 



Courses 

BSCS, Blue Version 
BSC r ’ Green Version 
BSCS. Yellow Version 
Regular Biology 

CBA 

CHEM Study 
Regular Chemistry 

i 

PSSC 

Regular Physics 



Space Marked by Student 
on Test Cover 

3 

1 

2 

6 

1 

2 

4 



1 

3 



Examinees were categorized as students of the regular course if they indicated that they were 

not students of the ’’special” courses. 

B . Checks on the Accuracy of Examinee Responses Regardin g t he Courses They Studi ed 

There has been concern about the accuracy of the examinee responses regarding 

courses studied. Several checks on the accuracy of these responses have been made. 

These checks have always involved asking staff members of schools attended l^y 

examinees to provide information on the science courses the examinees have studied. 

In most of these checks a list of examinees was sent to each school along with the 
courses the examinees said they studied. The schools were asked to verify or reject 
the examinees’ claims. 

One check focused on examinees who said they were not sure about their course, 

as well as those who made uninterpretable responses not in keeping with the directions, 
and those who failed to respond. 

Since most of the comparisons of performance were based on groups of examinees 
identified solely on the basis of examinee responses, there were understandably some 
qualms about the validity of these comparisons. Hence, for thq January and May tests 

of 1967 in all three subjects complete analyses of performance for samples based on 
student responses and samples based on school verifications were carried out. 
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C. Ratings of the Appropriateness of Test Questions for S t udents of Different Courses b jr 
Teachers of the Courses 

These ratings of appropriateness by the teachers actually serv ed two purposes. 

( 1) identification of subsets of items in each science achievement test that could 
serve as an unbiased measure of achievement in that science; and (2) direct judgments 

of the appropriateness of tests for different courses. 

The following procedure was used in securing the ratings and identifying the 
questions appropriate for both courses under comparison in all investigations except 
the early ones in chemistry. About ten teachers of course X and ten teachers of 
course Y at a rating conference were directed to answer all of the questions in the 
test (to insure careful consideration of questions) and to record their ratings of the 
appropriateness of each question for the course they represented. A teacher was asked 
to give each question one of the following three ratings: appropriate and emphasized, 
appropriate but not emphasized, or inappropriate. The numbers 2, 1, or 0, respec- 
tively, were associated with the ratings. 

Appendix II contains a copy of the written directions for rating questions that was ? 

given to the teachers. -O; ..v, ; •;; Vv -- ’ -..'-rv; • 

In order to serve as a rater of die appropriateness of test questions for course X , 
a teacher had to be actively engaged in teaching course X. Ordinarily a rater was a 

teacher of only course X. A few raters were active teachers of courses X and Y; 

but such teachers provided ratings for only one of those two courses. • 

The questions selected for the unbiased measure of achievement were those rated 
uniformly and equally high by both X and Y teachers. The selected questions were 
those that had ratings with high means and low variances for course X or course Y 
teachers considered separately, and equal mean ratings for both courses. At least 

20 questions were selected for most test forms studied. 

If the appropriateness of a test for courses X, Y, and Z was under study, one 
subset of questions appropriate for both X and Y was used ip comparing the appro- 
priateness of the full test for X and Y students. Another subset of questions appro- 
priate for X and Z was used in comparing the appropriateness of the complete test 
for X and Z students. Usually there Jyaa considerable overlap between the subsets. 
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For the early investigations in chemistry, appropriateness ratings were obtained 
from CBA and CHEMS teachers only. It was assumed that the questions were appro- 
priate for regular students and hence no ratings from teachers of regular courses were 
secured. In the early chemistry investigations, the raters were instructed to rate each 
question as either appropriate or inappropriate for the course they represented. Each 
rating of appropriate was assigned a value of 1 and each rating of inappropriate was 
assigned a value of 0. Then substantial numbers of questions that had relatively high 
appropriateness ratings were selected for the unbiased measure of achievement in 
chemistry. 

D. Actual Test Performance of Examinees in Different C ourses 

For most of the comparisons, the scores of examinees in two courses on ( 1) a 
complete science achievement test, ( 2) a subset of the achievement test rated equally 
appropriate for both courses, and (3) the Scholastic Aptitude Test were used. Several 
of the comparisons in chemistry took no account of SAT performance. All of the com- 
parisons except for the first two in chemistry were for science achievement tests given 

in January or May. . ' ' 

E- Adjusted Test Performance of Examinees in Different Courses 

The final goal was to obtain the mean test scores on a given science achievement 

test form to be expected of two groups of examinees from two different courses who 

were equivalent in science ability and in scholastic aptitude. The statistical technique 
for obtaining these mean scores was the analysis of covariance. 

According to Lindquist ( 1953) there are seven conditions to be met in the analysis 
of covariance before valid tests of the significance of differences between adjusted 

mean scores can be made. These seven conditions are: 

1. Students in both course samples are drawn at random from the same 

parent population or selected from the same parent population only 

on the basis of scores on the concomitant variables; i. e. , the 

appropriate- for- both questions and SAT. 

2. Scores of students on the appropriate- for- both questions and SA't y; 
are not differentially affected by the courses the students studied. 
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3. The achievement test scores for both course samples are randona 
samples of those for corresponding course populations. 

4. The regression of achievement test scores on appropriate- fibr-btsth 
scores and SAT scores is the same for both course populations. 

5. This regression is linear. 

6. The distributions of achievement test scores for each of the two course popu- 
lations are normal. 

7. Both of these distributions have the same variance. 

A discussion of how well the experimental design met each of these conditions follows: 

1. With respect to the first condition, it could be assumed that a parent 
population consists of all students who take a given science achievement 

test form. The students in the regular and special course samples for 

this test were not drawn at random from this parent population nor were 
they selected from it only on the basis of scores on the appropriate- for- both 
questions and SAT. Students ended up in the regular or special course 
populations for die test on the basis of their indications of the courses 
they studied. 

2. The second condition is that performances on the appropriate - for r both 
questions and on SAT are not differentially affected by the courses students 

studied. These effects are probably slight. If questions are judged by 

teachers of both courses to be appropriate for both courses, achievement 
on these questions is probably not differentially affected to an appreciable 
degree by the courses studied. Differential effects of courses on SAT 
performance are likely to be even smaller. 

3. The third condition is that the achievement test scores for both samples 
in a comparison are random samples of those for the corresponding 
populations. This condition is contained in 1. If condition 1 is met, then 
condition 3 is surely met. Meeting of condition 3 is necessary but not 
sufficient for meeting condition 1. The regular arid special course samples 
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were drawn systematically from regular and special course populations, 
respectively. Tests were made of the significance of differences between 
population and corresponding sample means. Even more important for a 
given comparison between course X and course Y was the following 
consideration: The difference between the X sample and population means 

should not differ significantly from the difference between the Y sample 
and population means. Figure 1 illustrates the criticality of this con- 
sideration . 



FIGURE 1 

Examples of Various Relationships Between Course X Sample and Population Means and 

Course Y Sample and Population Mean 
Example 1 V 

Y Sample Mean 

Y Population Mean 



X Sample Mean 
X Population Mean 



X Sample Mean 



X Population Mean 



Example 2 



Y Sample Mean 



Y Population Mean 



X Sample Mean 



X Population Mean 



Example 3 



Y Sample Mean 

Y Population Mean 
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. ns based on samples in examp es 

T^e compariso population means is 

1 „ o tVlP difference between the X sampx 
l ” eXSmP1 ' ' _ dlffcre „„ between the V sample and population 

markedly different to * d on samples will be invalid. Comparisons 

means. Here me eompmison "J P ^ „ lation means was 

in which the difference between the regul ... sam ple and 

. j-tforpn ee between the special sample a 

significantly different from me difference be 

a „s at me 01 level were ruled out of consideration. 

population means at th . 

V d ,orp made for each comparison. Ail 

4 . Two different tests of condibon 4 were made ^ 

comparisons in which errors o, estimate or ^ ^ ^ „ 

for me two samples differed significantly a. ■ 

consideration. 

- . c 6 will be assumed. 

and 6. Condition _ satisfaction of condition 7 was 

’■ :.rr 

i- mo fnr the two course samples unue 

of estimate about me regression lines fo ; J 

comparison. ^ r t a n|,m« correlations between achievement 

test scores and scores on me concomitant variables were found 

samples in each comparison. 

; t nATA and discussion , 

■ The dam compiled wiU be predated am. discussed in me following or or. 

1 Numbers and Cbsrsctenstics o, « ~t 

•• • _ R^rrardine Courses Studied 

* Pvaminee Responses Regaraing 

B . Accuracy of Era of Different Courses 

C. Ratings of me Appropriateness of Test Que 

by Teachers of the Courses 
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D. Actual Performances of Examinees in Different Courses on: 

1. Complete Science Achievement Tests 

2. Subtests of the Achievement Tests Rated Appropriate for Different Courses 

3. Scholastic Aptitude Test ( SAT) 

E. Adjusted Performances of Examinees in Different Courses on the Science 

Achievement Tests After Taking Account of Performances on Subtests of 

Achievement Tests Rated Appropriate for Different Courses and SAT 

A. Numbers and Characteristics of Examinees in Different Cour se Categories 

The total numbers of examinees taking the Biology, Chemistry, and Physics 
Achievement Tests in recent school years are given in Appendix III. 

The numbers of examinees studying the different courses in each science 
are given in Appendix IV. These data and some additional data all drawn from 
Swineford's statistical reports ( 1962- 1969) are depicted in Figures 2-4. 

In these three graphs the uncertains included those examinees who indicated 
that they were not sure about the course they took as well as those who failed 
to respond and a few who marked spaces that were to be left blank. Those 
biology examinees who indicated they studied a BSCS course but were uncertain 
of the version were included in the BSCS (All Versions) category. 



Numbers of Biology Examinees in Different Biology Courses 
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Numbers of Chemistry Examinees in Different Chemistry Courses 
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Figures 2-4 reveal that more students take the Biology and Chemistry Tests in 
May than in January; and that for biology, chemistry, and physics, students in 
special courses constitute a larger proportion of the examinee group in May 
than in January. Data presented by Fomoff, Kastrinos, and Thompson (all 1969) 
show that compared to the January examinees the May examinees in all three 
subjects include more juniors, more students from independent schools, more 
residents of the Northeast, and more students of the new courses. 

One generalization supported by the data is that the numbers of examinees 
who have studied the leading new courses in each subject has been increasing, 
whereas the number who have studied the regular courses has been decreasing. 

The regular course category is a catch-all for all examinees who did not think 
they took one of the new courses that is specifically named. The regular courses 
are not as diverse as one might think, however. Drawing on data reported by 
Fornoff, Kastrinos, and Thompson (all 1969) the following Table 1 on textbooks 
used in regular courses by samples of examinees in 1965-1966 was constructed. 

It clearly shows that for both mid- school-year ( 1965-1966) senior examinees 
and end- of- school- year (1965-1966) junior examinees in biology, chemistry, 
and physics the Holt, Rinehart and Winston series of textbooks titled Moder n . 
Biology, Modern Chemistry, and Modern Physics , respectively, were the 
predominant texts in regular courses. 



- 20 - 



Textbooks 

Senior 



Subject 



TABLE 1 

Used in Regular Courses by Samples of Mid-School- Year (1965-66) 
Examinees and End-of- School-Year (1965-66) Junior Examinees 
in Biology , Chemistry, and Physics 



Leading 

Regular- Course 
Textbooks 



Mi d- Y e ar S eni or 
Examinees 

Total No. Percentage 
Reporting Using Each 
Reg. Text 



End-of-Year Junior 
Examinees 

Total No. Percentage 
Reporting Using Each 
Reg.- Text 



Biology 



Chemistry 



Physics 



1. Modem Biology 

2. Exploring Biology. Science 
of Living Things 



1. Modern Chemistry 

2 . Chemistry and You 

1. Modern Physics 

2. Physics, an Exact Science 

3. Exploring Physics 



1,554 598 

h 3 % 36% 

5 6 

1,331 . ; 6 ? 8 ; 

56 .M* 

. 5 . u / : 

1,312 T06 

; U2 : " - : ■ 30 

•i u ■" ' ' : 2 

:/"••• 3 ; 3 
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B. Accuracy of Examinee Responses Regarding Courses Studied 

Several checks were made on the accuracy of the examinee responses to the 
questions about courses studied that appeared on the front covers of the tests. 

The first data on this matter were compiled by Stickell ( 1965) for the 
BSCS biology students who took the tylay 1963 Biology Test and are shown in 
Table 2. 

TABUS 2 



Accuracy of Examinee Responses Regarding BSCS Biology Courses 



Studied 



May 1963 Biology Test 


Cot. 


irse 


Number of 
Examinees Who 
Said They 
Studied Course 


Number Whose 
Responses Were 
Verified by 
Their Schools 


Number Whose 
Responses Were 
Denied by 
Their Schools 


Number Whose 
Responses Were 
Neither Veri- 
fied Nor Denied 


Percentage 

Agreement* 


BSCS 


Blue 


250 


226 


11 


13 


95 % 


BSCS 


Green 


95 


55 


28 


12 


66 


BSCS 


Yellow 


170 


123 


43 


4 


r4 



*The percentage agreement was found by dividing the number whose response s_ we re verified by the 
number whose responses were either verified or denied and multiplying by 10 . 



The next data on the accuracy of student responses were compiled for the 
May 1963 and January 1964 Physics Tests; these data are shown in Table 3. 



















Accuracy of Examinee Responses Regarding Physics Courses Studied: May 1963 and January 196U Physics Test 
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The most extensive check on the accuracy of student responses regarding courses 
studied was made for the Biology, Chemistry, and Physics Tests given in January and 
in May of 1967. The data from this check are presented in Table 4. 



TABLE 4 



Accuracy of Examinee Responses Regarding Science Courses Studied: 
January and May 19^7 Biology, Chemistry, and Physics Tests 



Test 

Date 

Jan. 1967 


Cour s e 


Number in a 
Sample of 
Examinees who 
Said They 
Studied Course 


Number Whose 
Responses Were 
Verified by 
Their Schools 


Number Whose 
Responses Were 
Denied by 
Their Schools 


Number Whose 
Responses Were 
Neither Verified 
Nor Denied 


Percentage 
Agreement * 


BSCS— Blue 
BSCS-Green 


500 

500 


378 

343 


63 

91 


59 

66 


86% 

79 






BSCS- 

Yellow 


496 


370 


50 


76 


88 






Regular 

Biology 


498 


271 


138 


89 


66 






CBA 
CHE MS 


439 

500 


260 

371 


115 
64 . 


64 

65 


69 

85 






Regular 

Chemistry 


500 


40 4 


15 


81 


96 






PSSC 


500 


40 4 


40 


56 


91 






Regular 

Physics 


498 


392 


31 


75 


93 


May 


1967 


BSCS— Blue 
BSCS-G-reen 


500 

500 


420 

375 


35 

76 


45 

49 


92 

83 






BSCS- 

Yellow 


500 


432 


33 


35 


93 . 






Regular 

Biology 


499 


314 


87 


98 


78 






CBA 

CHEMS 


500 

500 


307 

403 


131 

47 


‘ 62 
50 


70 

90 






Regular 

Chemistry 


500 


415 


26 


59 


94 






PSSC 


500 


392 


4l 


67 


91 ; 






Regular 

Physics 


500 


392 


28 


80 


93 



*The percentage 
whose responses 



agreement was found by dividing the number whose responses were 
were either verified or denied and multiplying by 100 . 



verified by the number 
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The percentage of examinees in a course category who apparently do not belong 
there is fairly high in some cases; e.g., 34% in Regular Biology on the January 1967 
Biology Test, and 31% in CBA on the January 1967 Chemistry Test. The presence of 
these misplaced examinees in the samples could adversely affect the validtty of com 
parisons of the performance of the groups on the tests. Hence, the analyses of the 
1967 tests in all three subjects became of critical importance because for these 
tests complete analyses of the performance of student-response samples and of 
school-verified samples were carried out. If the results for the school-verified 
samples were similar to the results for the student-response samples, then it could 
be inferred that the analyses of the earlier tests, based on student-response samples 
only, were valid. The results for the 1967 school- verified and student- response 
samples are presented later in connection with the complete results for all the 

samples. 

An investigation of the courses actually studied by examinees who indicated they 
were uncertain of their course was made for the May 1964 Physics Test. Of the 
8,151 examinees for this test, 1,849 fell in the uncertain category. The uncertains 
included 1,165 who specified they were not sure what course they studied, 9 who 
made uninterpretable responses not in keeping with the directions, and 675 who 
failed to respond at all, A systematic sample numbering 380 was drawn from the 

1,849. The schools of 351 of these 380 students were asked to indicate what course 
the students studied. The schools of the remaining 29 were not contacted because 
school addresses were not available for 26, two were in foreign countries, and one 
was a college student. Of the 351 students whose schools were contacted, school 
replies were received for 302, of which 24 (8%) were designated as students of the 
PSSC Physics Course and 278 (92%) were designated as students of some other 
physics course; i.e., the. regular course. 




o 
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C. Ratings of the Appropriateness of Test Questions for Students of Diffe rent Courses 

► 

by Teachers of the Courses 

Each question in each test form was rated by teachers of the relevant courses as 
appropriate and emphasized (2 points), appropriate but not emphasized (1 point), or 
inappropriate (0). An exception to this procedure in chemistry is described below. 

For each question the mean and standard deviation of the ratings were found for 
each relevant course. For each test form the mean and standard deviation of the mean 
question ratings for each relevant course were determined. In Tables 5, 6, and 7 the 
number of teacher raters involved and the mean and standard deviation of their mean 
question ratings for the complete tests and for the highly and equally appropriate 
subsets are presented for biology, chemistry, and physics, respectively. 

Six of the 18 comparisons in Table 5 show significantly different mean ratings at 
the .05 level. Five of these six significant differences in mean ratings appear on the 
first two tests and all five favor the regular course. The other significant difference 
appears on the fourth test and favors BSCS-Yellow. There are no significant differences 
in mean ratings on the last two tests. In making these tests of significance, the standard 
deviation of mean ratings was assumed to be .40 for the May 1963 test. 

Appropriateness ratings for a dozen Chemistry Tests are presented in Table 6. For 
the first three tests, only ratings from CBA and CHEMS teachers were obtained; it was 
assumed that the tests were appropriate for the regular course. Questions in these 
first three tests were rated appropriate <1> or inappropriate (0). For the last nine 
tests, ratings on the 2, 1, 0 scale from CBA, CHEMS, and regular teachers are 
presented. 

If one considers the last nine Chemistry Tests in Table 6; i. e., beginning with the 
May 1963 test, there are 18 paired course comparisons (two comparisons for each of 
nine tests). Seven of the 18 comparisons show significantly different mean ratings at 
the .05 level. Six of these seven significant differences in mean ratings appear on the 
first three tests and all six favor regular courses. Only one of the seven significant 
differences appears on the last six tests; that one comparison favors CHEMS. In 



sfessj 
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table 5 



Biology: Teacler Ratings of* Appropriateness 



Mean Question Ratings 



Complete Test 



Highly and Equally 
Appropriate Subset 
of* Test 





No. of 


Number 


Test Courses 


Teacher 


of 


Date Compared 


Raters 


Questions 


May 1962 BSCS— Blue & 


5 


100 


Reg. Biol. 


5 




BSCS-Green & 


5 


100 


Reg. Biol. 


5 




BSCS— Yellow 


5 


100 


& Reg. Biol. 


5 




May 1963 BSCS— Blue & 


6 


100 


Reg. Biol. 


6 




BSCS-Green & 


6 


100 


Reg- Biol. 


6 




BSCS-Yellow 


3 


100 


8s Reg. Biol. 


6 




Jan. 1966 BSCS -Blue 8s 


11 


100 


Reg. Biol. 


8 




BSCS-Green 8s 


9 


100 


Reg. Biol. 


8 




BSCS-Yellow 


11 


100 


8: Reg. Biol. 


8 




May 1966 BSCS-Blue 8s 


11 


100 


Reg. Biol. 


8 




BSCS-Green 8s 


9 


100 


Reg. Biol. 


8 




BSCS-Yellow 


ii ' 


100 


8s Reg. Biol. 


8 




Jan. 1967 BSCS-Blue & 


11 


100 . 


Reg. Biol. 


■ 10. 




BSCS-Green & 


10 


100 


Reg. Biol. 


.10 




BSCS-Yellow 


11 


100 


8: Reg. Biol. 


10 




May 1967 BSCS-Blue 8s 


11 


100 


Reg. Biol. 


10 




BSCS-Green & 


10 


100 


Reg. Biol. 


10 




BSCS-Yellow 


11 


106 


S: Reg. Biol. 


10 




*Means significantly different at .05 


level 



Number 



Mean 


Stand. 

Dev. 


of 

Questions 


Mean 


Stand. 

Dev. 


1.31* 

1.55 


.36 

.40 


19 


1.90 

1.93 


.10 

.09 


1.42* 

1.55 


.46 

.40 


17 


1.93 

1.91 


.10 

.10 


1.43* 

1.55 


.43 

.40 


20 


1.93 

1.91 


.09 

.10 


1.24* 

1.57 




22 


1.91 

1.87 


.12 

.12 


1.51 

1.57 




17 


1-92 

1-95 


.08 

.08 


1.39* 

1.57 




20 


1.82 

1.88 


.16 

.13 


1.35 

1.33 


.51 

.38 


26 


1.72 

1.72 


.24 

.21 


1.29 

1.33 


.48 

.38 


17 


1.74 
. 1.74 


.25 

.21 


1.44 

1.33 


.38 

.38 


30 


1.73 
.. 1.73 


.18 

.18 


1.31 

1.22 


.46 

.52 


22 


1.73 

1.73 


.17 

.20 


1.31 

1.22 


.42 

.52 


21 


1.74 

1.74 


.13 

.21 


1.50* 

1.22 


.42 

.52 


23 


1.83 

1.83 


.11 

.14 


1.45 

1.44 


.41 . ;• 
.37 


34 


1.76 
1.7 6 


.20 

-17. 


1.46 

1.44 


.32 

.37 


32 


1.78 

1.78 


.14 

.15 


1*49 

lL44 


.29 

.37 


36 


1.74 

1.74 


.15 

.16 


1.55- 

1.61 


.37 

.28 


32 


■ 1.81 

1.81 


.19 

.12 


1.56 

1.61 


.26 

.28 


39 


1.76 

1.76 


.11 

.17 


1.61 

1.61 


.26 

.28 


50 


1.79 

1.79 


.14 

.15 
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Courses 
Compared 

Dec. 1961 CBA & 

Reg. Chem. 

CHEMS & 
Reg. Chem. 

Mar. 1962 CBA & 

Reg. Chem. 

CHEMS & 
Reg. Chem. 



May 1962 



May 1963 



CBA & 

Reg. Chem. 

CHEMS & 
Reg. Chem. 

CBA & 

Reg. Chem. 

CHEMS & 
Reg. Chem. 

Jan. 196U CBA & 

Reg. Chem. 

CHEMS & 
Reg. Chem. 



May 196 U 



Jan., 1965 



May 1965 



CBA & 

Reg. Chem. 

CHEMS & 

Reg. Chem. 

CBA & 

Reg. Chem. 

CHEMS & 
Reg. Chem. 

CBA & 

Reg. Chem. 

CHEMS & 
Reg. Chem. 

Jan. 1966 CBA & 

Reg. Chem. 

CHEMS & 
Reg. Chem. 



May 1966 



CBA & 

Reg. Chem. 

CHEMS & 
Reg.. Chen 



TABLE 6 

Chemistry: Teacher Ratings of Appropriateness 

Mean Question Ratings 



No. of 
Te acher 
Raters 



8 

9 

9 

9 

9 

12 

12 

12 



Complete Test 



Highly and Equally 
Appropriate Subset 
of Test 



Number 

of 

Questions 

95 



95 

95 

95 

95 



Mean 

.77 



Stand. 

Dev. 

.28 



Number 

of 

Questions 

76 



1.58 

1.5U 

1.60 

1.5U 

. 1.22 

Hw'1.36 

1-53* 
. 1.36 



■ UO 

.U9 

.38 

.U9 

.51 

.52 

.39 

.52 



o 

ERIC 



*Means significantly different at 



.05 level 



26 

29 

19 

31 



Mean 

.89 



Stand. 

Dev. 

.lH 



5 


95 


.72 


.30 


7U 


.86 


.16 


6 


100 


.65 


.35 


67 


.86 


.18 


5 


100 


.62 


.36 


67 


• 8U 


.17 


5 


95 


•7t> 


.29 


71 


.90 


.13 


6 


95 


.72 


.27 


70 


.86 


.15 


9 

10 


95 


1.30* 

1.76 


.51 

.29 


20 


1.82 

1.92 


.09 

.11 


10 

10 


95 


l.Ho* 

1.76 


.U8 

.29 


20 


1.83 

1.92 


.13 

.11 


8 

9 


95 


1.39* 

1.55 


.5U 

.38 


28 


1.8H 

.1.82 


: .09 
.lH 


9 

9 


95‘ 


1.U0* 

1.55 


.57 

.38 


31 


1.85 

1.85 


.15 

.12 


9 

12 


95 


1.2H* 

1.56 


.53 

.36 


20 


1.78 

1.78 


.10 

• • .15 


12 

12 


95 


1. Ur* 
1.56 


.U9 

.36 


30 


1.79 

1.79 


.lU 

.15 


9 

9 


95 


1.U3 

1.56 


.50 

.uu 


20 


■ 1.88 
; . 1.88 


.10 

.10 


10 

9 


95 


1.5U 
1.56 , 


.U2 

.uu 


20 


1.92 

1.88 


.09 

.10 


7 

7 


95 


1.58 

1.55 


.UO 
. U3 


20 


1.89 

1.90 


.09 

.11 


8 

7 


95 


1.5U 

1.55 


.U2 

.U3 


20 


1.91 

1.90 


. 10 
.11 



1.90 

1.90 

1.88 

1 . 88 . 

1.77 

1.77 

1.82 

1.82 



.05 

.10 

.12 

.11 

.10 

.20 

.lU 

.lU 
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table 6 (con't) 



Mean Question Ratlings 



Complete Test 



Highly and Equally 
Appr op r i at e Sub s et 
of Test 







0 

0 

Hj 


Number 


Test 


Courses 


Teacher 


of 


Date 


Compared 


Raters 


Questions 


Jan. 1967 


CBA & 


9 


90 




Reg. Cbem. 


11 




CHEMS & 


11 


90 




Reg. Cbem. 


11 


May 1967 


CBA & 


9 


90 




f Reg. Cbem. 


11 




CHEMS & 


11 


90 




Reg. Chem. 


11 



Number 



Mean 


Stan d * 
Dev. 


of 

Questions 


Mean 


Stand, 

Dev. 


1.60 


.43 


38 


1.85 


.11 


1.55 


.36 


1.85 


.14 


1.65 


.37 


44 


1.83 


.16 


1.55 


.36 


1.83 


.15 


1.58 


.42 


31 


1.87 


.11 


1.62 


.36 


1.87 


.11 


1.67 


.33 


45 


1.8b 


.17 


1.62 


.36 


i.eu 


.13 



i 




that the tests have evolved 



chemistry there is convincing evidence from teacher ratings 
so as to be as appropriate for students of CBA and CHEMS courses as for students of 

regular courses. 

Appropriateness ratings for eight Physics Tests are shown in Table 7. For the 
first two tests two different methods were used to pick the appropriate for both 
questions. In the first method the emphasis was on selecting questions that had PSSC 
and regular ratings with high means and low variances. In the second method the 
emphasis was on picking questions that had equal PSSC and regular ratings. The 
reader will recognize that these were two considerations in picking all sets of appro 
priate-for-both questions. For the first two Physics Tests the effect of emphasizing 
one or the other of these considerations was studied. The effect was appreciable as 
will be shown and discussed in Part E. 

In Table 7 only two of the eight comparisons (January 1965 and May 1966) show 
significantly different mean ratings at the .05 level and both favor the regular course. 

D. Actual Performances of Examinees in Different Courses o n the. 

1 . Complete Science Achievement Tests 

2. Subsets of the Achievement Tests Hated Appropriate for Diffe rent Courses 

3. Scholastic Aptitude Test 

The actual mean test scores of the examinees in different courses on the Science 
Achievement Tests are given in Appendix V. The mean scores in Appendix V are all 
on the same College Board Scale, which can extend from 200 to 800. Data on the 
actual test performance of samples of those examinees in the different course cate- 
gories will be presented next. The mean scores of these samples will be set forth by 
subjects. Some of the samples for the early studies were not samples at all, but 
simply all the examinees for whom the relevant data could be assembled. For later 
studies systematic samples were constituted with answer sheets drawn at evenly 
spaced intervals from the answer sheets of all examinees in- a given course. From 
many of these systematic samples the examinees for whom no SAT scores were 
available were excluded thus yielding reduced systematic samples. In other instances 
reduced systematic samples were drawn at evenly spaced intervals from lists of 
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table 7 

Physics: Teacher Ratings of Appropriateness 

. Mean Question Ratings 

Highly and Equally 
Appropriate Subset 

Complete Test of Test 



Test 

Date 


••* Courses 
Compared 


Ho. of 

Teacher 

Raters 


Number 

of 

Questions. 


Mean 


Stand. 

Dev. 


Number 

of 

Questions Mean . 


Stand. 

Dev. 












Emphasis on picking ques- 
tions whose ratings had 
high means and low vari- 
ances among the 10 raters 
for each course 


May 1963 


PSSC & 

Reg. Phys. 


10 

10 


75 


i.4o 

l.5l 


.48 

.30 


, 7 V 1.96 

17 1.68 


.19 

.19 


Jan. 1964 


PSSC & 

Reg. Phys. 


10 

10 


75 


1.50 

1.52 


.55 

.4o 


21 1-89 

21 1.76 


.20 

.13, 














Emphasis on picking ques- 
tions with equal ratings 
for both courses. 


May 1963 


PSSC & 

Reg. Phys . 


10 

10 


" ' . . . 






19 

• ■ ' . ;.,' 1 . 58 '; 


' ; .23 ' 
.28 


Jan. 1964 
May 1964 


PSSC & 

Reg . Phys . 

PSSC & 

Reg. Phys. 


10 

10 

12 

10 


75 


1.43 

1.47 


.46 
' . 39 


17 v; 1 '* 76 

1.72 

20 i-T 8 

; 1 . 78 • 


.22 
: .24 

.14 
: *.17 


Jan. 1965 


PSSC ’ & 

Reg. Phys. 


10 

9 


75’ 


1.35 # 

1.52 


.54 : 
.38 


: ,-r :■■■■■'■ a . 88 


.15 
• .10 


Jan . 1966 


PSSC & 

Reg. Phys . 


10 

9 


75 

... . j. 


>■ 1.53 
. . 1;;51 


-:v 5 ii^v 

: • 36 


7.: 1.88 ■: 




May 1966 


PSSC & 

Reg. Phys. 


12 
*. 10 


75 


•i.4o # 

.1.58 


.4o 

.33 


' 20 1.82 
■ \ 1.82.= 


.10 .■ 

iv.17-; 


Jan.' 1967 


PSSC & 

Reg. Phys . 


11 ' 
9 


75 


1.48 
l. 59 


4l 

.34 


22 1.82 
. - : : 1.82 


,19 ■: 

.13 


May 1967 


PSSC & 

Reg. Phys. 


11 

9 


75 


1.54 . 
1.59 


.49 

.36 


1.87 

■ Zt> ; 1.87 


: .14 

.09 



*Means significantly different at .05 level 








-31- 



examinees for whom all the relevant data existed. For the Biology Test of May 1963 the 
reductions in the systematic samples for the BSCS courses were due to lack of verifi- 
cation from teachers that examinees studied the course they said they did as well as 
unavailability of SAT scores. For several of the studies in chemistry, achievement 
test performance was adjusted only for performance on subsets of the achievement 
test rated equally appropriate for both courses and not on SAT; in these instances 
the systematic samples suffered no reduction due to the unavailability of SAT scores. 

In every instance the mean scores in Tables 8, 9, and 10 are for a systematic 
sample or for a reduced systematic sample. The mean scores are for a systematic 
sample if no number of examinees is specified for a reduced systematic sample. 

For all achievement tests given in May 1963 and thereafter, the significance of 
the difference between the achievement test mean score of all examinees in a course 
category (data in Appendix V) and the achievement test mean score of the sample 
shown in Tables 8, 9, or 10 was determined. The sample means that differed signifi- 
cantly from their population means at the .05 or .01 level are designated by appropriate 
footnotes in Tables 8, 9, and 10. As discussed in the Design Section an important 
consideration in each comparison was that the regular course sample mean not differ 
from the regular course population mean by a significantly different amount than the 
special course sample mean differed from the special course population mean. Com- 
parisons for which the differences between differences were significant at the .05 and 
.01 levels are identified by a’s and A’s, respectively, in Tables 8, 9, and 10. 

The data on the actual test performances of the examinees that appears in 
Tables 8, 9, and 10 will now be presented in graphical form. In Figures 5, 6, and 7 the 
actual mean scores on the complete science achievement tests of the populations and 
of the samples of examinees from different courses will be presented. These graphs 
allow one to identify at a glance the samples that differed markedly from their popula- 
tions. Figures 8 through 13 pertain only to the samples. In Figures 8, 9, and 10, 
the actual mean scores of samples of biology examinees are shown. Similar data 
for chemistry are presented in Figures 11 and 12, and the analogous data for physics 

appear, in Figure 13. 35 
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•Sample mean significantly higher than population mean at .01 level 
'Sample mem significantly lower than population mean at .05 level 
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TABLE 10 

Actual Performance of Samples of Physics Examinees on the Physics Achievement Tests, on Subsets of Questions in the 
Physics Achievement Tests Rated Appropriate for Both Courses under Comparison, and on the Scholastic Aptitude Test 
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Samples were drawn and analyses of test appropriateness were done for Biology Tests 
given on six of the test dates listed in Figure 5; these six dates were May 1962, May 1963 
January 1966, May 1966, January 1967, and May 1967. The performance of four samples 
of students on each of these six tests was analyzed. Twenty of the 24 samples means did 
not differ significantly from their population means. The four sample means that did 
differ significantly from their population means were as follows: 

1. May 1962, BSCS-Yellow, Sample mean significantly higher than population mean 

at .05 level 

2. May 1963, BSCS -Green, Sample mean significantly higher than population mean 
at .01 level 

3. May 1966, Regular, Sample mean significantly higher than population mean at 
.05 level 

4. May 1967, BSCS-Yellow, Sample mean significantly lower than population mean 
at ,01 level 

For two comparisons (BSCS-Green and regular, May 1966 and BSCS-Yellow and 
regular, May 1966) the differences between regular sample and population means were 
significantly different from the differences between BSCS sample and population means 
at the .05 level. The one comparison that will be ruled out of consideration because 
the difference between the regular sample and population means is significantly differ- 
ent from the difference between the BSCS sample and population mean at the .01 level 
sticks out plainly in Figure 5. It is the comparison between regular Biology and 
BSCS-Green in May 1963. 

Samples were drawn and analyses of test appropriateness were done for Chemistry 
Tests given on eight of the test dates shown in Figure 6. January 1964 was the single 
omission. Twenty-two of the sample means did not differ significantly from their 
population means. The wo sample means that did differ significantly from their 
population means were as follows: 

1. January 1965, Regular, Sample mean significantly higher than population mean 
at .01 level 

2. May 1967, CBA, Sample mean significantly lower than population mean at 
.05 level 

so . . 
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Figure 5 

Biology - Mean Scores on the Complete Biology Achievement Tests of Populations and Samples of Examinees from 

. Different Biology Courses 
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The two comparisons that will be ruled out of consideration because differences 
between Regular Chemistry sample and population means are significantly different 
from differences between special course sample and population means are immediately 
apparent in Figure 6. They are the comparisons between CBA and regular and between 

CHEMS and regular in January 1965. 

Samples were drawn and analyses of test appropriateness were done for Physics 
Tests given on all of the test dates shown in Figure 7 except May 1965. Three of the 
16 sample means did differ significantly from their population means. They were as 

follows: 

1. May 1966, Regular, Sample mean significantly higher than population mean 
at .01 level 

2. May 1967, PSSC, Sample mean significantly lower than population mean at 
.01 level 

3. May 1967, Regular, Sample mean significantly lower than population mean 
at .05 level 

Fortunately, in each comparison the difference between the regular sample and 
population mean was nearly the same as the difference between the PSSC sample 
population mean. None of thfe differences differ significantly at the .01 level; none 
of the comparisons need be /ruled out. 

In Figures 8 through 13 the actual mean scores of samples of examinees from 
different courses are shown. In any one figure the mean scores of examinees from 
only two courses are arrayed. One of those courses is the regular course, and dashed 
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lines are used throughout to indicate the mean scores of the regular- course examinees 
Attention is directed to the differences between mean scores of samples on complete 
achievement tests and the corresponding differences between the means of the same 
samples on the subtest and on SAT-V and M. For example, if course X examinees 
score higher than course Y examinees on a complete achievement test, one would 
expect course X examinees to score higher on the appropriate-for-both subtest and 
on SAT-V and M, also. If they do, there is evidence that some or all of the superior 
perfor "ance of the X examinees on the achievement test can be ascribed to their 
superior science achievement and scholastic aptitude. On the other hand, if X 
examinees score higher on the achievement test, but lower on the appropriate-for- 
both subtests and on SAT, there is evidence of achievement test bias in favor of 
course X. T following figures are a useful first approach to the differing perform- 
ances of examinees from difference courses on achievement tests and related 
measures. No particular significance should be attached to the absolute value of 
the mean scores on the appropriate-for-both subtests. These subtests had widely 
differing numbers of items and all the subtest scores are certainly not or, the same 

scale. V ; 

The data in Figure 8 show that, except for the May 1962 Biology Test, the 
BSCS- Blue examinees score consistently higher than the regular examinees on the 
complete Biology Achievement Tests, on the appropriate-for-both subtests, and on 

SAT-V and M. . -''T ,• T-y -r-U-irt. ' 

For all six of the tests shown in Figure 9 the examinees that score higher on the 

achievement tests also score higher on the appropriate-for-both subtests and on 
SAT-V and M. For five test dates the regular examinees score higher than the 
BSCS-Green examinees on all measures. The BSCS- Green, sample of May 1963 
was very atypical and not representative of its population. The sample mean was 
567 whereas the population mean was 505. The sample mean was significantly 
higher than the population mean at the .01 level. Furthermore, the difference 
between the BSCS-Green sample and population means was significantly different 
from the difference between the Regular Biology sample and population mean at the 
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Figure 8 

Biology: BSCS-Blue and Regular Sample Means on the Complete Biology Achievement Tests, 

on Subsets of the Biol. Ach. Tests Rated Appropriate for Both Courses, and on SAT V 

and M 
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Figure 9 

-Green and Regular Sample Means on the Complete Biology Achievement Tests, 
the Biol. Ach. Tests Rated Appropriate for Both Courses, and on SAT - V 

and M 
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.01 level. The comparison between BSCS-Green and regular in May 1963 will not be 
analyzed or considered further. 

On all six test dates in Figure 10, the BSCS-Yellow examinees score higher than 
the regular examinees on the achievement tests and on the appropriate -for- both sub- 
tests. The SAT means are closely clustered, but in May 1963 the regular examinees 
scored higher than the BSCS-Yellow examinees on SAT, suggesting possible bias in 
this achievement test in favor of BSCS-Yellow students. 

In Figures 11 and 12 only chemistry data from May 1963 and later are presented 
although earlier data are given in Table 9. The reason is i - « ; he chemistry investi- 
gations prior to May 1963 did not involve ratings from teachers of regular courses; 
the test questions were simply assumed to be appropriate for regular students. 
Beginning with the May 1963 test the appropriate- for- both subtests were identified 
on the basis of ratings for both regular and special courses. The result of this 
change was an appreciable drop in the number of questions in the appropriate -for- 
both subtests. 

For January 1965 in chemistry the difference between the CB A sample and 
population means and the difference between the CHEMS sample and population 
means were both significantly different from the difference between the regular 
sample and population means at the .01 level. Neither chemistry comparison for 
January 1965 will be analyzed or considered further. 
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Figure 10 

Biology: BSCS-Yellow and Regular Sample Means on the Complete Biology Achievement Tests, 

on Subsets of the Biol. Ach. Tests Rated Appropriate for Both Courses, and on SAT — V 

and M 
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Figure 11 

Chemistry: CBA and Regular Sample Means on the Complete Chemistry Achievement Tests, 

on Subsets of the Chem. Ach. Tests Rated Appropriate for Both Courses, and on SAT - V 

and M 
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Figure 12 



Chemistry: CHEMS and Regular Sample Means on the Complete Chemistry Achievement Tests, 

on Subsets of the Chem. Ach. Tests Rated Appropriate for Both Courses, and on SAT - V 

and K 
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Figure 13 



Physics: PSSC and Regular Sample Means on the Complete Physics Achievement Tests, on 
Subsets of the Physics Ach. Tests Rated Appropriate for Both Courses, a d 

and M 
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A major feature of the actual performance data presented in Tables 8-10 and 
depicted for the most part in Figures 8-13 is that the students from the course in 
each comparison who score higher on the achievement test almost always score 
higher on the appropriate-for-both items and on SAT-V and M. If one takes as the 
indication of bias that one course in a comparison have the higher mean on the 
achievement test and the other course have the higher mean( s) on all available 
concomitant measures, the data can be summarized as shown in the chart below. 



Subject 

Biology 



Chemistry 



Physics 



Comparison 

BSCS-Blue & Regular 
BSCS-Green & Regular 
BSCS-Yellow & Regular 

CBA and Regular 
CHEMS and Regular 



PSSC and Regular 



Comment 

No indications of bias 
No indications of bias 
No indications of bias 

There are indications of bias in 
favor of regular students on wo 
tests, May 1962 and May 1963. 

Or. three tests, December 1961, 
May 1963, and January 1966, 
there are indications of bias in 
favor of regular students. 

There is an indication of bias 
in favor of regular students on 
the January 1965 test. 



For the January and May 1967 Biology, Chemistry, and Physics Tests, checks on the 
accuracy of student responses regarding courses studied were made. These c,,oucs 
involved asking schools to verify or deny the responses of their students regarding the 
courses they studied. In Table 11 the test performances in January and May 1967 of 
student-response samples and of school-verified salaries are presented. Each school- 
verified sample is a subsample of the corresponding student-response sample. 

The data in Table 11 are from 18 student-response samples and 18 corresponding 
school -verified samples; there are eight samples of each kind in biology, six in 
chemistry, and four in physics. In 17 of the 18 cases, the school-verified sample 
had a higher mean score on the achievement test than the corresponding student- 
response sample. The lone exception is the Regular Biology sample of January 1967. 



%. 



TABLE 11 

Actual Performance of Student-Response and School-Verified Samples of Biology, Chemistry, ana Physics Examinees from January and May 1967 on the 
Biology, Chemistry, and Physics Achievement 'Tests, respectively; on Subsets of the Biology, Chemistry, and Physics Tests 
Rated Appropriate for Courses under Comparison , respectively, and on the Scholastic Aptitude Test 
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In all 18 cases the differences in mean scores on the complete achievement test of 
student-response and school- verified samples are accompanied by differences in the 
same direction in the mean scores on the appropriate -for-both subtest and on SAT-V 
and M. 

Considering the 12 comparisons based on school- verified samples, for six of six 
comparisons in biology and both comparisons in physics higher means on the science 
achievement tests are associated with higher means on all three concomitant variables. 
This same pattern holds in chemistry for the appropriate -for -both questions, but there 
are some exceptions in the relationship between performance on the Chemistry 
Achievement Test and SAT. 

E. Adjusted Performances of Examinees in Different Courses on the Science Achiev e^ 
ment Tests After Taking Account of Their Performances o n Subsets of the Achieve- 
Tests Rated Appropriate for Different Courses and on SAT 

The mean scores on the science achievement tests expected of students in the regular 
and special courses who were equivalent in performance on the appropriate -for -both 
subsets and on SAT will be presented in Tables 12, 13, and 14 for biology, chemistry, 
and physics, respectively. The most valid adjustments and comparisons are for those 
instances where there were no significant differences between the two courses under 
comparison in terms of three considerations: (1) The difference between the regular- 

course population and sample means not significantly different from .he difference be 
tween the special- course population and sample means; (2) the regression-line slope 
for the regular course not significantly different from the regression- line slop- for 
the special course; (3) the standard error of estimate for the regular course not 
significantly different from the standard error of estimate for the special course. 
Significant differences in each of these three categories at the .OS and .01 levels are 
identified by lower-case and capital letters, respectively, in the tables. No adjusted 
means are presents L a those instances where differences in any one of the three 
categories were significant at the .01 level. This will preclude the possibility of 
attaching importance to the results of invalid comparisons. Adjusted means that were 
significantly different at the .05 and .01 levels. are marked by d's and D’s, respectively. 

5S 
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TABLE 12 



Adjusted Performances of Samples of Examinees on the Biology Achievement Tests. 
M“an Scores to he Expected of Examinees Who Performed Equivalent!* 011 the 
Subsets of the Achievement Tests Rated Appropriate for Both Courses 

and on SAT-V and M 



Expected Mean Scores on Achievement Tests and Differences 
Between Means After Adjustment for Per f orman ces _ o nj 



Test 


Courses Under 


Number of 
Examinees 


aetween me 
Subset of 


Ach. Test 


SAT-V 


and M 


Both Ach. 
and SAT-V 


Subset 
and M 








Means 


Diff . 


Means 


Di ff . 


Means 


Diff. 


May 

1962 


BSCS-Blue 
& Reg. Biol. 


30 

15U 


511 

511 


0 


504 

512 


- 8 


510 

511 


- 1 


May 

1962 


BSCS-Green 
& Reg. Biol. 


80 

154 


488 

502 


-l4 


491 

501 


-10 


496 

498 


- 2 


May 

1962 


BSCS- Ye H ow 
& Reg. Biol. 


71 
15 U 


514 

519 


- 5 


529 

512 


17 


518 

517 


1 c 


May- 

1963 


BSCS -Blue 
& Reg. Biol. 


112 

131 


528 

*538 


-10 


535 

532 


3 b 


527 

539 


-12 


May 

1963 


BSCS-Green 
& Reg. Biol. 


36 

131 




A 




A 




A 


May 

1963 


BSCS-Yellow 
& Reg. Biol. 


53 

131 


535 

534 


1 


574 

519 


55 D 


547 

529 


l8 d 


Jan. 

1966 


BSCS-Blue 
& Reg. Biol. 


438 

46l 


546 

538 


8 d 


549 

533 


16 bD 




B 


Jan. 

1966 


BSCS-Green 
& Reg. Biol. 


357 

46l 


515 

516 


- 1 


517 

515 • 


2 


519 

515 


4 b 


. Jan. 

1966 


BSCS-Yellow 
& Reg., Biol. 


4ll 

46l 


534 

526 


8 cd 


533 

527 


d 


533 

527 


6 d 


May 

1966 


BSCS-Blue 
& Reg. Biol. 


463 

499 


561 

561 


0 


563 

558 


5 


559 

562 


- 3 


May 

1966 


BSCS-Green 
& Reg. Biol. 


403 

499 


535 

536 


- 1 a 


531 

539 


- 8 a 


536 

535 


1 a 


May 

1966 


BSCS-Yellow 
& Reg. Biol. 


497 

499 


547 

552 


- 5 a 


551 

547 


4 a 


548 

551 


- 3 ab 



(Table continued on next page.) 
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table 12 (con't) 

Student-Response Samples for January and May 1967 Tests 



Expected Mean Scores on Achievement Tests and Differences 
Between Means After Adjustment for Performances _on^ 



Test 

Date 


Courses Under 
Comparison 


Humber of 
Examinees 


Subset of Ach. Test 


SAT-V 


and M 


Both Ach. 
and SAT-V 


Subset 
and M 








Means 


Diff. 


Means 


Diff. 


Means 


Di ff . 


Jan. 


BSCS-Blue 


500 


518 


-13 D 


531 


13 D 


520 


- 9 D 


1967 


& Reg. Biol. 


498 


531 




518 




529 




Jan. 


BSCS-Green 


500 


501 


- 7 d 


508 


7 


504 


- 1 


1967 


& Reg. Biol. 


1*98 


508 




501 




505 




Jan. 


BSCS-Yellow 


1*96 


515 


- 8 D 


525 


13 D 


517 


- 4 


1967 


& Reg. Biol. 


1*98 


523 




512 




521 




May 


BSCS-Blue 


500 


534 


-13 D 


543 


4 


534 


-13 D 


1967 


& Reg. Biol. 


1*99 


547 




539 




547 




May 


BSCS-Green 


500 


520 


- 4 


522 


0 


521 


-2 


1967 


& Reg. Biol. 


1*99 


524 




522 




523 




May 


BSCS-Yellow 


500 


533 


- 8 D 


543 


12 D 


534 


- 6 D . 


1967 


& Reg. Biol. 


1*99 


541 




531 




540 






School-Veri f ied 


Samples for 


January and 


May 1967 Tests 






Jan. 


BSCS-Blue 


378 


525 


-11 D 


534 


11 d 


525 


-11 D 


1967 


& Reg. Biol. 


271 


536 




523 




536 




Jan . 


BSCS-Green 


342 


513 


— 1 


516 


5 


513 


- 1 


1967 


& Reg. Biol. 


271 


514 




511 




514 




Jan . 


BSCS-Yellow 


370 


517 


- 8 d 


526 


13 d 


518 


- 5 


1967 


& Reg. Biol. 


271 


525 




513 




523 




May 


BSCS-Blue 


420 


540 


-l6 D 


548 


3 


54i 


-l4 D 


1967 


& Reg. Biol. 


3lU 


556 




545 




555 




May 


BSCS-Green 


375 


534 


0 


535 


1 


534 


- 1 


1967 


& Reg. Biol. 


3l4 


534 




534 




535 




May 


BSCS-Yellow 


432 


538 


— 7 D 


545 


10 


538 

cl.). 


■ - 6 D 


1967 


Ss Reg. Biol. 


314 


545 




535 




544 





a Differences between regular-course sample and population observed means are significantly different from 

differences between special-course sample and population observed means at .05 level,. 

A Differences between regular-course sample and population observed means are significantly different from 

differences between special- course sample and population observed means at *01 level. 

B Regression— line slopes are significantly different at .05 level 
B Regression-line slopes are significantly different at .01 level' ' • 

c Standard errors of estimate are significantly different at .05 level ■- 

C Standard errors of estimate are significantly different at .01 level 

^ Adjusted means are significantly different at . 05 level 

D Adjusted means are significantly different at .01 level 
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TABLE 13 

Adjusted Performances of Samples of Examinees on the Chemistry Achievement Tests: 

Mean Scores to be Expected of Examinees Who Performed Equivalently on the 
Subsets of the Achievement Tests Rated Appropriate for Both Courses 

and on SAT-V ,and M 

Expected Mean Scores on Achievement Tests and Differences 
Between Means After Adjustment for Performances onz _ 

Both Ach. Subset 
and SAT-V and M 



Test 

Date 


Courses Under 
Comparison 


Number of 
Examinees 


Subset of 


Ach. Test 








Means 


Diff. 


Dec . 


cba & 


61 


551. 


- 7 D 


1961 


Reg. Chem. 


276 


558 


1 


Dec . 


CHEMS & 


32 


552 


_ Q d 


1961 


Reg. Chem. 


276 


561 


y u. 


Mar. 


CBA & 


107 


519 


- 8 D 


1962 


Reg. Chem. 


126 


527 




Mar. 


CHEMS & 


52 


517 


-12 D 


1962 


Reg. Chem. 


12 6 


529 




May 


CBA & 


310 




C 


1962 


Reg. Chem. 


350 


j 




May 


CHEMS & 


105 




B 


1962 


Reg. Chem. 


350 






May 


CBA & 


370 


552 


-33 cD 


1963 


Reg. Chem. 


370 


585 




May 


CHEMS & 


370 


511 


-11 cD 


1963 


Reg. Chem. 


370 


585 




May 


CBA & 


365 


533 


-32 bD 


1961 


Reg. Chem. 


225 


565 




May 


CHEMS & 


369 


538 


-32 D 


19 61 


Reg. Chem. 


225 


570 




Jan . 


CBA & 


358 




A 


1965 


Reg. Chem. 


358 






Jan. 


CHEMS & 


337 




A 


1965 


Reg. Chem. 


358 






May 


CBA & 


361 


561 


-10 D 


1965 


Reg. Chem. 


370 


571: 




May 


CHEMS & 


: . 370 


568 


- 3 


1965 


Reg. Chem. 


370 


> ; 5J1 




Jan. 


CBA & 


500 


V 551 


- 9 bD 


1966 


Reg. Chem. 


500 


560 




Jan. 


CHEMS & 


500 


566 


. -11 D 


1966 


Reg. Chem. 


500 


577 





SAT-V and M 
Means Diff . Means Diff^ 



(Table continued on next page.) 
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table 13 (con , t) 



Expected Mean Scores on. Achievement Tests and Differences 
B etween Means After Adjustment for Performances on: 



Test 


Courses Under Number of 

r^m-nn-r-i son Examinees 


Subset of 


Ach. Test 


SAT-V 


and M 


Both Ach. 
and SAT-V 


Subset 
and M 








Means 


Diff . 


Means 


Di ff . 


Means 


Diff. 


May - 
1966 


CBA & 

Reg. Chem. 


1*96 

1*97 


550 

558 


- 8 d 




BC 




B 


May 

1966 


CHEMS & 
Reg. Chem. 


493 

1*97 


563 

562 


1 




B 


563 

561 


2 b 






Student -Res pons e 


Samples for 


January and 


May 1967 Tests 






Jan . 

1967 


CBA & 

Reg. Chem. 


439 

500 


578 

574 


4 d 


575 

577 


- 2 


578 

57 1 * 


4 d 


Jan. 

1967 


CHEMS & 
Reg . Chem. 


500 

500 




B 


59^ 

584 


10 d 


596 

582 


l4. D 


May 

1967 


CBA & 

Reg. Chem. 


500 

500 


544 

553 


- 9 D 


54l 

555 


-14 D 


5kk 

552 


- 8 D 


May 

1967 


CHEMS & 
Reg. Chem. 


500 

500 


573 

567 


6 d 


575 . 

565 ' 


10 d 


573 

567 


6 D 






School-Verified Samples for January and May 1967 Tests 






Jan. 

1967 


CBA & 

Reg. Chem. 


260 

4o4 


592 

582 


10 D 


585 

586 


- 1 


592 

582 


10 D 


Jan. 

1967 


CHEMS & 
Reg. Chem. 


371 

4o4 




B 


603 

590 


13 a 




C 


May 

1967 


CBA & 

Reg. Chem. 


307 

4l5 


559 

565 


- 6 ’ 


555 

568 


-13 d 


559 

565 


- 6 d 


May 

1967 


CHEMS & 
Reg . Chem. 


403 

415 


578 

572 


6 d 


580 

571 


9 


578 

572 


6 D 



a Differences' -between regular-course sample and population observed means are significantly different from 

differences "between special-course sample and population observed means at the .05 level 

A Differences between regular- course sample and population observed means are significantly different from 

differences between special-course sample and population observed means at the .0 l eve 

b Regression-line slopes are significantly different at .05 level 
B Regression-line slopes are significantly different at .01 level 
C Standard errors of estimate are significantly different at .05 level 

• C Standard errors of estimate are significantly different at .01 level 

- d Adjusted means are significantly different at .05 level ' Y . : 

B Adjusted means are significantly different at .01 level 
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table ik 

Adjusted Performances of Samples of Examinees on the Physics Achievement Tests: 
Mean Scores to he Expected of Examinees Who Performed Equivalently on the 
Subsets of the Achievement Tests Rated Appropriate for Both Courses 

and on SAT-V and M 



Test 

Date 



Courses Under 
Comparison 



Number of 
Examinees 



Expected Mean Scores on Achievement Tests and Differences 
Between Means After Adjustment for P erformances on: — 

Both Ach. Subset 
and SAT— V and M 



o 
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Subset of Ach. Test ' SAT-V and M 

Means Pi ff . Means Diff »' Means 



May ’63 and Jan. >64 Tests Using Questions for Subset Chosen with Emphasis on Selecting Questions 
Whose Appropriateness Ratings had High Means and Low Variances 



May 

1963 


PSSC & 

Reg. Fhys. 


369 

359 


572 
. 58U 


-12 cD 


582 

57*» 


8 b 


573 

583 


-10 D 


Jan. 
196 k 


PSSC & 

Reg. Phys . 


356 

363 


558 

556 


2 


551 

564 


-13 cd 


555 ■ 
559 


- 4 


May ’63 and Jan. f 64 Tests Using Questions for Subset Chosen 
with Equal Mean Appropriateness Ratings for PSSC and Regular 


with Emphasis on Selecting Questions 
Courses 


May 

1963 


PSSC 8c 
Reg. Phys. 


369 

359 


580 

576 


4 


582 

574 


8 


578 

578 


' 0 


Jan. 

196 u 


PSSC 8c 
Reg. Phys. 


355 

363 


551 

563 


-12 D 


551 

564 


-13 cD 


550 

565 


-15 bD 


May 
196 k 


PSSC 8c 
Reg. Phys. 


330 

337 


573 

567 


6 


570 

570 


0 


571 

569 


2 


Jan. 

1965 


PSSC 8c 
Reg. Phys. 


313 

332 


575 

577 


- 2 


569 

582 


-13 d 


572 

579 


- 7 a 


Jan. 

1966 


PSSC 8c 
Reg . Phys . 


476 

457 


589 

583 




585 

588 


- 3 


587 

585 


2 


May 

1966 


PSSC 8c 
Reg. Phys. 


514 

538 


623 

628 


- 


620 

630 


-10 d 


621 

630 


- 9 cD 






Student -Re spons e 


Samples for 


Jan-, rv and May 1967 Tests 






Jan. 

1967 


PSSC 8c 
Reg . Phys . 


500 

498 


573 

580 


. - 7 d 


B 




573 

580 


- 7 D 


May 

1967 


PSSC 8c 
Reg. Phys. 


500 

500 


588 

601 


-13 D 


598 

591 


7 


589 

600 


-11 D 






School— Ve ri fied 


Samples for. 


January and May 1967 Tests 






Jan. 

1967 

May- 

1967 


PSSC 8c 
Reg.' Phys . 

PSSC 8c 
Reg. Phys. 


404 - 

.392 

,392 

392 


579 

585 

595 

608 


- 6 

-13 D 


583 
581 : 

608 

594 


V 2 

•14 D 


579 
. 585 

596 

607 


- 6 d 
-11 D 



a Differences between regular- course sample ana population TiC ’n V 7 

differences between special— course sample and population observed means a is ■ 

A Differences betwIS regular-course sample and population observed means are .significantly different from 

differences between special-course sample and population observed means at, the .01 level 

b Regression— line slopes are significantly different* at -05 level 
B Regression-line slopes are significantly different at .01 level 
c Standard errors of estimate are significantly different at, .05 level . . . 

C standard errors of estimate are significantly different at .01 level 

d Adjusted means are significantly different at, .05 level 

D Adjusted means ‘are significantly different at .01 level 
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For the May 1963 Biology Test the data are for school- verified samples; for the January 
and May 1967 tests in biology, chemistry, and physics data for both student-response and 
school- verified samples are presented. For the other tests the data are for student- 
response samples. 

The adjustments reported in Tables 12, 13, and 14 are reasonable and proper only 
if fairly strong positive correlations exist between scores on the achievement tests and 
scores on the concomitant variables. The correlations between achievement test scores 
and scores on the appropriate-for-both questions cluster a little below .90. These 
correlations are spuriously high since the appropriate-for-both questions were in the 
achievement tests. The correlations between achievement scores and SAT scores tend 
to be around .65. The correlations were judged to be sufficiently strong to make the 
adjustment of achievement test scores for performance on the concomitant measures 
reasonable and proper. 

Tables 12, 13, and 14 contain the adjusted mean scores of students of regular and 
special courses on six Biology Tests, nine Chemistry Tests, and eight Physics Tests. 

The differences between the adjusted mean scores of students from regular and special 
courses are categorized in Table 15 by course comparisons and by direction and magni- 
tude. The data in Table 15 are based on student-response samples; the data for the 
school-verified samples in January and May, 1967 are not included in the categorization 
displayed in Table 15. The comparisons in physics on the May 1963 and January 1964 
tests using the appropriate-for-both questions chosen with emphasis on selecting 
questions whose ratings had high means and low variances are also excluded. The 
comparisons for which no adjusted means are shown in Tables 12, 13, and 14 are not 
shown in Table 15. . . 

The data in Table 15 make it clear that there is little, bias in the science achievement 

tests. When the observed achievement means are adjusted for performance on the 
appropriate-for-both questions, then for none of 17 comparisons in biology, for four of 
17 i n chemistry, and for none of eight comparisons in physics do the differences be- 
tween adjusted means of regular and special students exceed 15 scale score points . 

If means are adjusted for performance on SAT only, then for three of 17 comparisons 










Each X represents a difference between adjusted mean scores on a test form. 
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in biology, for none of four in chemistry, and for none of seven in physics do the 
differences between the adjusted means of regular and special students exceed 
15 scale score points. If means are adjusted for performance on the appropriate- 
for-both questions and on SAT, then for only one of 16 comparisons in biology, for 
none of five in chemistry, and for none of eight in physics do the differences between 
the adjusted means of regular and special students exceed 15 scale score points. 

The comparisons shown in Table 15 and discussed in the preceding paragraph 
are for student-response samples. For the January and May 1967 tests in biology, 
chemistry, and physics comparisons were made for both student- response samples 
and school-verified samples. It is assumed that comparisons based on school- 
verified samples are valid because these samples contain few misplaced students. 

If the comparisons based on student-response samples yield results closely similar 
to those based on school-verified samples in January and May 1967, then it is 
reasonable to assume that all of the earlier comparisons based on student -response 
/ samples only are as valid as the comparisons based on school- verified samples. 
Scrutiny of the results from the two types of samples for 1967 in Tables 12, 13, 
and 14 show that the results are closely similar. From the three tables one can 
read out for the January and May 1967 tests a total of 33 differences between adjusted 
means based on student -response samples and contrast these with a total of 33 differ- 
ences between adjusted means based on the corresponding school-verified samples. 

In eight cases the differences based on the two types of samples are identical, in 
21 cases the differences differ by 4 points or less, and in only four instances do the 
differences differ by more than 4 scale score points. It seems entirely reasonable 
to conclude that the comparisons based on student-response samples are as valid as 

comparisons based on school- verified samples. 

In chemistry, considering only student-response samples, there were 17 valid 
comparisons (nine between CBA and regular and eight between CHEMS and regular) 
in which adjustments were made for performance on appropriate -for -both questions, 
but only five valid comparisons ( two between CBA and regular and three between 
CHEMS and regular) in which adjustments were made for performance on appropriate 
for-both questions and S AT. This procedure 'resulted from the thought that the 



V 
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appropriate- for- both adjustment was the more appropriate, critical, and meaningful one 
to make. The data in Table 13 show that the adjustments using appropriate-for-both 
questions only yield results nearly identical to th'-e obtained when adjustments for all 
concomitant variables are made. In no instance do the differences between adjusted 
means resulting from adjusting for only the one concomitant variable differ by more 
than 1 point from the differences between means resulting from adjusting for all con- 
comitant variables. 

In biology and physics as well the adjustments for all concomitant variables yield 
results similar to the results of adjusting only for performance on the appropriate- 
for-both questions. However, the similarity is not quite as striking as that observed 
for chemistry. If one looks only at the January and May 1967 results for school- 
verified samples, it is clearly apparent that the adjustments involving all concomitant 
variables are much more closely similar to the adjustments involving the appropriate- 
for-both questions only than to those using SAT only. This is to be expected since 
correlations between the achievement test scores and appropriate-for-both question 
scores are higher than correlations between achievement test scores and SAT scores. 

A final topic for discussion is the differing adjustments that resulted for the 
May 1963 and January 1964 Physics Tests from using two different sets of appropriate- 
for-both questions. . When the Physics Test means were adjusted for performance on 
a set of appropriate-for-both questions chosen with emphasis on picking questions 
that had PSSC and regular teacher ratings with high means and low variances, the 
regular adjusted mean exceeded the PSSC mean on the May 1963 test by 12 points, 
whereas the PSSC mean exceeded the regular mean by 2 points on the January 1964 
test. When the emphasis was on picking questions with equal PSSC and regular ratings 
the PSSC mean exceeded the regular mean by 4 points on the May 1963 test and the 

regular mean exceeded the PSSC mean by 12 points on the January 1964 test. These 

results indicate the sensitivity of the 'statistical adjusting procedure to the questions 

selected to serve as the course-free measure of science achievement. As discussed 
in Part C on teacher ratings, the procedure ultimately adopted took both considera- 
tions into account. Questions with high and equal regular and special ratings were 

selected for the measures of course-free scienqe achievement. 
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VI. CONCLUSIONS 

The evidence clearly supports the general conclusion that the College Board Science 
Achievement Tests are equally appropriate for students of regular and special courses 
in biology, chemistry, and physics. 

The appropriateness of six Biology Tests for BSCS- Blue and Regular Biology was 
studied. On the first two tests there were indications from teacher ratings of bias in 
favor of Regular Biology, but these were not borne out by analyses of student test 
performance. On five of the six tests studied, the BSCS- Blue means were higher 
than the Regular Biology means. After adjusting for performance on the questions 
rated appropriate -for- both BSCS- Blue and regular, then none out of the. six tests was 
found to be biased. An adjustment using all concomitant variables was excluded for 
one test because the regression- line slopes were significantly different at the .01 level. 
The adjustments on the remaining five tests using ail concomitant variables revealed 
no bias. 

The appropriateness of six Biology Tests for BSCS -Green and Regular Biology was 
studied. On the first test, there was an indication from teacher ratings of bias in 
favor of Regular Biology, but this was not borne out by analysis of student test perform 
ance. The adjusted student performance on one of the six tests was excluded because 
the difference between the BSCS-Green population and sample means was significantly 
different at the .01 level from the difference between the Regular Biology population 
and sample means. On the remaining five Biology Tests the Regular Biology means 
were higher than BSCS-Green means in every case. Regular Biology means were 
higher than BSCS-Green means on all concomitant measures, also, in every case. 

After adjustment for performance on concomitant measures, none of the tests was 

found to be biased. . .. 

The appropriateness of six Biology Tests for BSCS- Yellow and Regu lar Biology was 
studied. On the first two tests, there were indications from teacher ratings of bias in 
favor of Regular Biology, but these were not borne out by analysis of student test 
performance. , In fact, on the second test, the Regular Biology students scored con- 
siderably higher on SAT-V and M, but the BSCS- Yellow students scored hipber on the 
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Biology Test. The analysis of covariance adjusting for all concomitant measures showed 
that this test was biased in favor of BSCS-Yellow. On all six Biology Tests, BSCS-Yellow 
means exceeded Regular Biology means. Adjustment only for performance on questions 
rated appropriate for both courses revealed no bias in any of the six tests. Teacher 
ratings indicated the fourth test was biased in favor of BSCS-Yellow. 

The appropriateness of 12 Chemistry Tests for CBA and Regular Chemistry was 
studied. The teacher ratings indicated that the fourth, fifth, and sixth tests were biased 
in favor of Regular Chemistry. On the fifth test no data on student performance are 
available, but on the fourth and sixth tests analysis of student performance clearly sup- 
ported the teacher judgments. On these two Chemistry Tests, the regular students 
scored considerably higher, even though the CBA students scored higher or nearly as 
high on the questions rated appropriate for both courses. The performances of CBA and 
Regular Chemistry students on 11 Chemistry Tests were analyzed. One of these tests 
was excluded because the samples were not representative of the populations. On the 
ten remaining tests the means of Regular Chemistry students surpassed the means of 
CBA students in every case. Valid adjustments for performance on the questions rated 
appropriate for CBA and regular revealed two biased tests (the fourth and sixth) ; both 
favored Regular Chemistry. Means on four Chemistry Tests were adjusted for per- 
formance on all concomitant measures. One of those tests was the one already 
excluded because of nonrepresentative samples. Another was excluded because of 
significantly different regression- line slopes. For neither of the two remaining 

tests did adjustment for all concomitant variables reveal any bias. For the 

p y-^vY-. j\Y,;Yv.Y-; Y- Y YT .Y.. ; 

last five tests studied, there were no indications of bias from any source. 

The appropriateness of 12 Chemistry Tests for CHEMS and Regular Chemistry 
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was studied. Evidence from teacher ratings indicated that the fourth, fifth, and sixth 
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tests were biased in favor of Regular Chemistry, whereas the tenth test was biased in 4- 
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favor of CHEMS. Actual performance on the fourth and sixth tests supported the 

teacher judgments; the Regular Chemistry means on the Chemistry Tests were con- Y - ; ■ . ..v ' , 
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siderably higher than CHEMS means even though the regular and CHEMS students 
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performed at nearly, the same level on the questions rated appropriate for both courses 
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Student performance on the fifth test was not analyzed. For the 11 tests on which student 
performances were studied, one test was excluded because of nonrepresentative samples. 

On eight of the ten remaining tests the means of Regular Chemistry students were higher 
than the means of CHEMS students. Adjustments for performances on appropriate- for- 
both questions were ruled out on two tests because of significantly different regression- 
line slopes. On the eight remaining tests adjustments for performances on appropriate- for- 
both questions revealed bias in favor of Regular Chemistry in the case of two tests; 
these were the fourth and sixth tests. Means on four Chemistry Tests were 
adjusted for performance on all concomitant measures. One of these tests was the one 
already excluded because of unrepresentative samples. No bias was found in the three 



remaining tests after adjusting for all concomitant variables. 

The last Chemistry Test showing bias from adjusted performance data in favor of 
Regular Chemistry was introduced in May 1964. Five Chemistry Tests introduced since 
then have shown no bias except for an indication from actual performance of bias in favor 

of regular over CHEMS for the January 1966 test and an indication from teacher ratings 

of bias in favor of CHEMS over regular in the May 1966 test. Adjustments using only 

the appropriate-for-both questions and adjustments using all concomitant variables yield 

nearly ide- in chemistry. Hence, the small number of Chemistry Tests for 

which adji,. <•» using all concomitant variables were made does not cast much, if 

any, doubt on the general conclusion that the most recent Chemistry Tests are unbiased. 

•• ' ; The appropriateness of eight Physics Tests for PSSC and Regular Physics was / ' 

studied. Evidence from teacher ratings indicated bias in the fourth and sixth tests in 
favor of Regular Pliysics. One of these indications was supported by actual but not 
adjusted performance data. The performances of PSSC and Regular Physics students 
on eight Physics Tests were studied. On seven of the eight tests the PSSC means 
exceeded the Regular Physics means. After adjusting for performance on the appro- 
priate- for r both questions and again after adjusting for performance on all concomitant 
variables, no examples of bias were found. 

In summary, the indications of bias in the Biology and Physics Tests were few in 

• • t ••• -h • •• * • • wl Anf.>'AV/«1iie)1irp1\r.'.fY*nin v. i* y* •. • * * • 



number. The few indications that there were stemmed almost exclusively from 
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teacher ratings of appropriateness, and these teacher ratings were not borne out by 
student performance. Both teacher ratings and analyses of student performance indi- 
cated that Chemistry Tests introduced in 1963 and 1964 were biased in favor of 
Regular Chemistry over CBA and CHEMS. On five Chemistry Tests introduced 
since then, however, there were no indications of bias in favor of regular over CBA, 
there was one indication from actual performance of bias in favor of regular over 
CHEMS; but one indication, also, from teacher ratings of bias in favor of CHEMS 
over regular. There were no indications of bias from adjusted performance data 



on the last five Chemistry Tests studied. 
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APPENDIX I 

SCIENCE ACHIEVEMENT TESTS 
Question Rating Form 

I have rated the questions in terms of their appropriateness for: 

Modern Biology Modern Chemistry 

BSCS (Blue Version) CBA 

BSCS (Green Version) CHEMS 

BSCS (Yellow Version) 



I ■ >rm PAC1 



Modern Physics 
PSC-S ■ 







Appropriate 

And 


Appropriate 
But Not 
Smphas i zed 


In- 

appropriate 


No. 


Answer 


Appropriate 

And 

Emphasi zed 


Appropriate 
But Not 
Emphasized 


In- 

appropriate 
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22 . 
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23. 
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k. 
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APPENDIX II 



DIRECTIONS FOR RATING TEST QUESTIONS 



Check the course for which you are rating the appropriateness of the questions. 

For each question, enter the answer (A, B, C, D, or E) in the column headed Answer and then place 
a check in one of the three columns that^ follow. 

Check the column headed Appropriate and Emphasised if the question is based 
on material that is emphasized in the course for which you are rating the 
appropriateness of ^the questions. 

Check the column headed Appropriate But Not Emphasized if the question is 
one which you think students who have completed the course should be able 
to answer even though the question is based on material not emphasized in 
the course. 

Check the column headed In appr opr i at e if you think that students who have 
completed the course should not be expected to know the answer to the 
question. 
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APPENDIX III 

Total N umb ers of Examinees Taking Each Science Achievement Test per School Year 



Test 






Year 










1962-63 


1963 61* 


1961*-65 


1965-66 


1966-67 


1967-68 


Biology 


32,888 


1*1,270 


1 * 8 , 891 * 


50,506 


52,613 


60,776 


Chemistry 


--•53,156 


61,110 


65,729 


66,997 


65,628 


67,816 


Physics 


28,l*66 a 


29,192 


30,076 


28,528 


28 , 1*35 


29, ^ 


a In eludes 
took the 


27,152 examinees who took 
PSSC Physics Achievement 


the Physics 
Test 


Achievement 


Test and 


l, 3 lfc who 








Numbers and Percentages of Examinees in 
Different Courses 
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